5 research outputs found
Incorporating Prior Knowledge in Deep Learning Models via Pathway Activity Autoencoders
Motivation: Despite advances in the computational analysis of high-throughput
molecular profiling assays (e.g. transcriptomics), a dichotomy exists between
methods that are simple and interpretable, and ones that are complex but with
lower degree of interpretability. Furthermore, very few methods deal with
trying to translate interpretability in biologically relevant terms, such as
known pathway cascades. Biological pathways reflecting signalling events or
metabolic conversions are Small improvements or modifications of existing
algorithms will generally not be suitable, unless novel biological results have
been predicted and verified. Determining which pathways are implicated in
disease and incorporating such pathway data as prior knowledge may enhance
predictive modelling and personalised strategies for diagnosis, treatment and
prevention of disease.
Results: We propose a novel prior-knowledge-based deep auto-encoding
framework, PAAE, together with its accompanying generative variant, PAVAE, for
RNA-seq data in cancer. Through comprehensive comparisons among various
learning models, we show that, despite having access to a smaller set of
features, our PAAE and PAVAE models achieve better out-of-set reconstruction
results compared to common methodologies. Furthermore, we compare our model
with equivalent baselines on a classification task and show that they achieve
better results than models which have access to the full input gene set.
Another result is that using vanilla variational frameworks might negatively
impact both reconstruction outputs as well as classification performance.
Finally, our work directly contributes by providing comprehensive
interpretability analyses on our models on top of improving prognostication for
translational medicine
Aprendendo medidas de centralidade com redes grafo-neurais
Centrality Measures are important metrics used in Social Network Analysis. Such measures allow one to infer which entity in a network is more central (informally, more important) than another. Analyses based on centrality measures may help detect possible social influencers, security weak spots, etc. This dissertation investigates methods for learning how to predict these centrality measures using only the graph’s structure. More specifically, different ways of ranking the vertices according to their centrality measures are shown, as well as a brief analysis on how to approximate the centrality measures themselves. This is achieved by building on previous work that used neural networks to estimate centrality measures given other centrality measures. In this dissertation, we use the concept of a Graph Neural Network – a Deep Learning model that builds the computation graph according to the topology of a desired input graph. Here these models’ performances are evaluated with different centrality measures, briefly comparing them with other machine learning models in the literature. The analyses for both the approximation and ranking of the centrality measures are evaluated and we show that the ranking of centrality measures is easier to compute. The transfer between the tasks of predicting these different centralities is analysed, and the advantages of each model is highlighted. The models are tested on graphs from different random distributions than the ones they were trained with, on graphs larger than the ones they saw during training as well as with real world instances that are much larger than the largest training graphs. The internal embeddings of the vertices produced by the model are analysed through lower-dimensional projections and conjectures are made on the behaviour seen in the experiments. Finally, we raise and identify possible future work highlighted by the experimental results presented here.Medidas de Centralidade são um tipo de métrica importante na Análise de Redes Sociais. Tais métricas permitem inferir qual entidade é mais central (ou informalmente, mais importante) que outra. Análises baseadas em medidas de centralidade podem ajudar a detectar influenciadores sociais, pontos fracos em sistemas de segurança, etc. Nesta dissertação se investiga métodos para aprender a predizer estas medidas de centralidade utilizando somente a estrutura do grafo de entrada. Mais especificamente, são demonstradas diferentes formas de se classificar os vértices de acordo com suas medidas de centralidade, assim como uma breve análise de como aproximar estas medidas de centralidade. Nesta dissertação utiliza-se o conceito de uma Rede Grafo-Neural – um model de Aprendizagem Profunda que constrói o grafo de computação de acordo com a topologia do grafo que recebe de entrada. Aqui as performances destes modelos são avaliadas com várias medidas de centralidade e são comparadas com outros modelos de aprendizado de máquina na literatura. As análises para tanto a aproximação quanto a classificação das medidas de centralidade são feitas e se mostra que a classificação é mais fácil de ser computada. A transferência entre as tarefas de predizer as diferentes centralidades é analizada e as vantagens de cada modelo são destacadas. Os modelos são testados em grafos de distribuições aleatórias diferentes das quais foram treinados, em grafos maiores daqueles vistos durante o treinamento assim como com instâncias reais que são muito maiores do que as maiores instâncias vistas durante o treinamento. As representações internas dos vértices aprendidas pelo modelo são analisadas através de projeções de menor dimensão e se conjectura sobre o comportamento visto nos experimentos. Por fim, se identifica possíveis futuros trabalhosm destacados pelos resultados experimentais apresentados aqui
NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics
Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data